Skip to content

Conversation

@YaSuenag
Copy link
Member

@YaSuenag YaSuenag commented Oct 19, 2025

jhsdb jstack --mixed would not work when attaches to the process runs with -Xcomp.

It has been reported by @pchilano in #27728. You can reproduce the problem with Test.java (attached JBS). You can see following stack.

----------------- 646689 -----------------
"Thread-0" #24 prio=5 tid=0x00007f1cec18c890 nid=646689 waiting on condition [0x00007f1cd0158000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
   JavaThread state: _thread_blocked
0x00007f1cf3b7f462 __syscall_cancel_arch + 0x32
0x00007f1cf3b7375c __internal_syscall_cancel + 0x5c
0x00007f1cf3b766a8 ___pthread_cond_timedwait + 0x178
0x00007f1cf270e1e9 PlatformEvent::park_nanos(long) + 0x119
0x00007f1cf2005f4c JavaThread::sleep_nanos(long) + 0xfc
0x00007f1cf218789f JVM_SleepNanos + 0x28f
0x00007f1cdb95f299 java.lang.Thread.sleepNanos0(long) + 0x99 (Native method)

Thread.sleepNanos0 is the bottom stack, but actually it has more call frames. You can see them with -XX:+PreserveFramePointer.

----------------- 646841 -----------------
"Thread-0" #24 prio=5 tid=0x00007f4a0018c9e0 nid=646841 waiting on condition [0x00007f49e4fd7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
   JavaThread state: _thread_blocked
0x00007f4a0aa29462 __syscall_cancel_arch + 0x32
0x00007f4a0aa1d75c __internal_syscall_cancel + 0x5c
0x00007f4a0aa206a8 ___pthread_cond_timedwait + 0x178
0x00007f4a0970e1e9 PlatformEvent::park_nanos(long) + 0x119
0x00007f4a09005f4c JavaThread::sleep_nanos(long) + 0xfc
0x00007f4a0918789f JVM_SleepNanos + 0x28f
0x00007f49ef961099 java.lang.Thread.sleepNanos0(long) + 0x99 (Native method)
0x00007f49e7f477b4 * java.lang.Thread.sleepNanos(long) bci:33 line:509 (Compiled frame)
0x00007f49e7f41a64 * java.lang.Thread.sleep(long) bci:25 line:540 (Compiled frame)
0x00007f49e7f4037c * Test.run() bci:3 line:6 (Compiled frame)
0x00007f49ef943328 * java.lang.Thread.runWith(java.lang.Object, java.lang.Runnable) bci:5 line:1487 (Compiled frame)
                        * java.lang.Thread.run() bci:19 line:1474 (Compiled frame)
0x00007f49ef3385fd <StubRoutines (initial stubs)>
0x00007f4a08fc247e JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) + 0x4ce
0x00007f4a08fc2bb3 JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*) + 0x2d3
0x00007f4a08fc31bb JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*) + 0xab
0x00007f4a09185590 thread_entry(JavaThread*, JavaThread*) + 0xd0
0x00007f4a09004206 JavaThread::thread_main_inner() + 0x256
0x00007f4a09c66747 Thread::call_run() + 0xb7
0x00007f4a096fccc8 thread_native_entry(Thread*) + 0x128
0x00007f4a0aa20f54 start_thread + 0x2e4

Java frame might be use the register for frame pointer (RBP in AMD64) as general purpose register, so SA cannot rely it in stack unwinding.

hs_err log has mixed stack trace as "Native frames", it would be unwinded by NativeStackPrinter in HotSpot, and it works as mixed mode. NativeStackPrinter uses frame::next_frame() to find sender frame regardless whether Java frame or C frame, and it leverages sender FP/PC to create sender frame. On the other hand, SA separates CFrame and VFrame to unwind in mixed mode jstack, so sender FP/PC would not propagate to CFrame, thus the frame located at bottom of Java frame might not be shown.

It is difficult to unify unwinder in PStack in SA, so it would be reasonable to propagate sender FP/PC to the sender of CFrame.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8370176: Mixed mode jhsdb jstack cannot unwind call stack with -Xcomp (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27885/head:pull/27885
$ git checkout pull/27885

Update a local copy of the PR:
$ git checkout pull/27885
$ git pull https://git.openjdk.org/jdk.git pull/27885/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27885

View PR using the GUI difftool:
$ git pr show -t 27885

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27885.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 19, 2025

👋 Welcome back ysuenaga! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 19, 2025

@YaSuenag This change is no longer ready for integration - check the PR body for details.

@openjdk
Copy link

openjdk bot commented Oct 19, 2025

@YaSuenag The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@YaSuenag
Copy link
Member Author

/issue JDK-8370176

@openjdk openjdk bot changed the title Mixed mode jhsdb jstack cannot unwind call stack with -Xcomp 8370176: Mixed mode jhsdb jstack cannot unwind call stack with -Xcomp Oct 19, 2025
@openjdk
Copy link

openjdk bot commented Oct 19, 2025

@YaSuenag The primary solved issue for a PR is set through the PR title. Since the current title does not contain an issue reference, it will now be updated.

@YaSuenag YaSuenag marked this pull request as ready for review October 19, 2025 06:59
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 19, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 19, 2025

Webrevs

* @test
* @bug 8370176
* @requires vm.hasSA
* @requires os.family == "linux"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Windows and OSX have a similar problem that should be fixed also?

Copy link
Member Author

@YaSuenag YaSuenag Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is in mixed mode (PStack) only, thus we need to skip OSX because you mentioned mixed mode is not supported on OSX.

In Windows, I'm not sure, but I guess we need to consider UNWIND_INFO to unwind call frames correctly like DWARF in Linux, however it hasn't done yet. Thus we can think mixed mode is not supported in Windows too, so I didn't add Windows here.
https://learn.microsoft.com/cpp/build/exception-handling-x64

Actually I could not see all of stacks as following in mixed mode. It works in normal mode (without --mixed) of course. (I tested it on Windows 11 x64, upstream JDK built by VS 2022)

----------------- 13 -----------------
"Reference Handler" #15 daemon prio=10 tid=0x00000207280b9f70 nid=12684 waiting on condition [0x000000aaf6aff000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_blocked
0x00007fffa6b45844      ntdll!NtWaitForAlertByThreadId + 0x14
0x00000000ffffffff              ????????

Copy link
Member

@RealFYang RealFYang Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind you that this new test seems to fail even on linux systems without pstack. This is happening on both of my AMD64 machine running Debian 12 and ARM64 machine running Ubuntu 22.04.4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share .jtr file?

Copy link
Member

@RealFYang RealFYang Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This is what I got on my amd64 machine:

$ make test TEST="serviceability/sa/TestJhsdbJstackMixedWithXComp.java"

TestJhsdbJstackMixedWithXComp.jtr.txt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RealFYang Thank you for sharing it!

I think it might be caused by binary difference, it is not caused by this PR at least. So I think we can go forward this PR, make sence?

Your .jtr file implies stack unwinding was failed from the function by libc in following:

----------------- 2310034 -----------------
"SteadyStateThread" #39 prio=5 tid=0x00007fd2600358a0 nid=2310034 waiting for monitor entry [0x00007fd2351f4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
   JavaThread state: _thread_blocked
0x00007fd267930f16      __futex_abstimed_wait_common + 0xc6
----------------- 2310033 -----------------
"ForkJoinPool-1-worker-2" #38 daemon prio=5 tid=0x00007fd1ec006600 nid=2310033 runnable [0x00007fd2352f5000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
0x00007fd26797a545      __clock_nanosleep + 0x65
0x00007fd26797ee53      __GI___nanosleep + 0x13

Native stack unwinding on Linux AMD64 depends on DWARF (in AArch64, it depends on FP (x29) yet).
I downloaded and checked libc.so.6 in libc6-udeb_2.41-12_amd64.udeb, it has .eh_frame section which would be used by DwarfParser, but it does not have any symbols, and not have .gnu_debuglink ELF section. OTOH Fedora 43 which I confirmed to work has both symbols and .gnu_debuglink.

They are used for symbol resolution, not stack unwinding. However other difference(s) in binary might affect statck unwinding. Thus I think it is not a problem caused by this PR.

Copy link
Member

@RealFYang RealFYang Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I am just wondering if there is a workaround for these platforms. Or can we simply skip this when testing on them? Say, if this depends on the availability of pstack, maybe we can add check for that then. Otherwise, we may introduce test noise for people who use them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could reproduce the problem not only Ubuntu 22.04 but also 23.04 . However it did not happen on Ubuntu 24.04 .
According to your report, the problem would happen on AArch64, it implies the problem is not in DWARF parser only. (DWARF parser is only available on Linux AMD64 so far)

AFAICS stack unwinding would fail from the function in glibc (on Ubuntu 22.04 and 23.04 at least), so I suspect something wrong in glibc binary and/or behavior and/or compiler options on Ubuntu. but I'm not sure.

I checked glibc version from gnu_get_libc_version(). "2.37" is returned on Ubuntu 23.04, and "2.39" is returned on Ubuntu 24.04 . So I think it can be gnu_get_libc_version() with FFM at first of the test, then the test is skipped if it runs on glibc 2.38 or earlier. Is it ok?

I grep'ed test directory with "mixed", I found another tests (TestJhsdbJstackMixed.java, TestJhsdbJstackPrintVMLocks.java). I will add glibc check to them as another ticket if this solution is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the glibc version as I don't know much about the differences among these distributions.
But it works for me if you want to fix all the affected tests in another PR. Thanks for considering that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR to check glibc version in TestJhsdbJstackMixedWithXComp.java added by this PR. It skips the test on Ubuntu 22.04, OTOH it works on Fedora 43. It is expected.
I attempted to add this check to SATestUtils at first, but it seems to be difficult because native access have to be allowed all of SATestUtils users - the impact is too significant.

I will file another issue to apply this check to other tests of jhsdb jstack --mixed user after this PR.

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Thanks for fixing this.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 22, 2025
@RealFYang
Copy link
Member

@YaSuenag :
Hi, I tried the test on linux-riscv64 and seems this platform bears the same issue.
Would you mind adding this add-on fix for this platform please? Thanks.
8370176-riscv64.diff.txt

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 23, 2025
@YaSuenag
Copy link
Member Author

@RealFYang Thanks a lot for sharing a patch for RISC-V! Merged to this PR.

@YaSuenag
Copy link
Member Author

@plummercj Thanks a lot for your review!

I'm trying to fix mixed mode on Windows. I think we can unwind native stacks with this change, but it is not enough for Java frames - I think we can see all of them if we modify after this PR.

----------------- 13 -----------------
"Reference Handler" #15 daemon prio=10 tid=0x000001a8df240270 nid=4800 waiting on condition [0x0000002207fff000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_blocked
0x00007fff725a5844      ntdll!NtWaitForAlertByThreadId + 0x14
0x00007fff7244c30b      ntdll!RtlSleepConditionVariableCS + 0x14b
0x00007fff6f6ca688      KERNELBASE!SleepConditionVariableCS + 0x38
0x00007ffed580c92d      jvm!PlatformMonitor::wait + 0x3d
0x00007ffed5796f3f      jvm!Monitor::wait + 0x15f
0x00007ffed5448310      jvm!JVM_WaitForReferencePendingList + 0xb0
0x000001a8ce87fa18      <interpreter> native method entry point (kind = native)
0x000001a8df240710              ????????
0x0000002207fffa38              ????????
0x000001a8df240270              ????????
0x000001a8e000bd98              ????????
0x000001a8ce87f39b      <interpreter> native method entry point (kind = native)
0xfffffffffffffff7              ????????
0x000000003e871a28              ????????
0x0000000000000003              ????????
0x000000003e2054f8              ????????

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfr Pull request is ready for review serviceability [email protected]

Development

Successfully merging this pull request may close these issues.

3 participants